*Constructs instruments for donut estimation (Table B.5, column 5)
*Also runs estimation of upper tier EoS as an inputs
*Also computes price indices as an inputs
*Also constructs independent and dependent variables



*bilateral trade data
use oecd_gravity_data_intermediates.dta, clear
sort year originId sectorId

save temp.dta, replace

*Our Elasticities
*Using the GYY estimates
use theta_estimates_gyy.dta, clear
rename theta_gyy_pml theta
keep sectorId theta se_gyy

*Replace non-manufacturing theta in the baseline with the median elasticity
replace theta=. if sectorId<3
egen theta_non = median(theta)
replace theta=theta_non if sectorId<3
drop theta_non

merge 1:m sectorId using temp.dta
drop _merge

*Bringing in Nunn Data
merge m:1 countryISO using nunn_aggregate.dta
drop if _merge==2
drop _merge

sort originId destId sectorId year
*drop non-traded sectors
keep if sectorId<18
save temp.dta, replace

*Need to bring in geographic data for this exercise (distance etc)
local keepdistancevars D_distw D_contig D_self
merge m:1 originId destId year using bilateral_distance_oecd.dta, ///
	nogen keep(match) keepusing(`keepdistancevars')
	
*Constructing donuts
*Create dummy variables that, for each destination, exclude origins (assign a zero)
*that are more less than some specified distance away
*Donut 0 excludes 10% bottom distances (plus the own country)
*Donut 1 excludes the 25% bottom distances (plus the own country)
*Donut 2 exclude the bottom 50% of distances (plus the own country)

replace D_distw=0 if D_self==1
gen donut0=0 if D_self==1
replace donut0=1 if donut0==0

egen pctile10=pctile(D_distw), p(10)
egen pctile25=pctile(D_distw), p(25)
egen pctile50=pctile(D_distw), p(50)

gen donut1 = 0 if D_distw==0
replace donut1=1 if donut1==.
gen donut2 = 0 if D_distw-pctile25<0
replace donut2=1 if donut2==.
gen donut3 = 0 if D_distw-pctile50<0
replace donut3=1 if donut3==.

drop D_self D_contig D_distw


**************************************************************************
*Constructing Sectoral Price Indices
*************************************************************

*Taking logs
gen logX=log(X)
drop if logX==.
egen E = sum(X), by(destId sectorId year)
gen logE = log(E)

save temp.dta, replace

*Average trade shares, adjusted by the trade elasticity for each pair-sector-year
gen ts_theta = (logX-logE)/theta
*Price index for each destination-sector-year
egen logprice_base=mean(ts_theta), by(year destId sectorId)

*Now doing the donut versions
	forvalues x=0/3	{
	gen ts_theta`x' = (logX-logE)/theta*donut`x'
	egen logprice_`x'=mean(ts_theta`x'), by(year destId sectorId)
	}

	
	*Note: now we have the various estimated log prices in each destination-sector-year
	*But what we need to construct instruments is to associate them with origins

 keep if originId==destId

keep year sectorId destId logprice* logE pop
sort year destId sectorId
duplicates drop
 
 gen logpop = log(pop)
 egen clusterId=group(destId)
 
**************************************************************************
*Building the CES instruments

*Sigmas
local sigma_base=.87

*Baseline
gen comp1 = exp(logE)/(exp(logprice_base))^(1-`sigma_base') if sectorId>2
egen comp=sum(comp1),by(destId  year)
gen beta_base = comp1/comp
gen instrument_base = log(beta_base)+logpop
drop comp1 comp

*Instruments (log L +log beta, where beta is the demand residual)
forvalues x=0/3	{
gen comp1 = exp(logE)/(exp(logprice_`x'))^(1-`sigma_base') if sectorId>2
egen comp=sum(comp1),by(destId  year)
gen beta_`x' = comp1/comp
gen instrument_`x' = log(beta_`x')+logpop
drop comp1 comp
}

*merging back with original data

keep sectorId year destId instrument*
rename destId originId

merge 1:m originId sectorId year using temp.dta, ///
	nogen

save int_reg_data_donuts.dta, replace


***********************************************************************************************
*Generating independent and dependent variables under various assumptions
****

*Dependent variable for the baseline
gen logX_theta=log(X)/theta

*Now doing the donut versions where we exclude nearby destinations

	forvalues x=0/3	{
	gen logX_theta`x'=logX_theta*donut`x'
	}


*Now construct independent variables for all cases
*Baseline
egen X_s = sum(X), by(year originId sectorId)
gen logX_s = log(X_s)
gen logL=log(X_s/nominal_wage_go)
label var logL "Log L_ik under the baseline assumption of labor as the only factor of production"

*Cleaning up, labeling variables
keep  originId destId sectorId year countryISO instrument* logX* logL* Qc ln_credit_banks real_wage_pwt theta* pop se_gyy

sort originId destId sectorId year
order originId destId sectorId year
 
save regression_data_donuts.dta, replace






 
 
 


